Hamid Kabir Findings

Loading the necessary libraries

First we need to load some libraries.

Loading the Data

First we will load the data from our csv file.As we are search the average number of new cases and deaths over the each month, so the Date column must be converted into three sub columns(Day/Month/Year) as needed for our analysis.

Have looked for the few rows of our data

        Date Name.of.State...UT Latitude Longitude Total.Confirmed.cases Death
1 2020-01-30             Kerala  10.8505   76.2711                     1     0
2 2020-01-31             Kerala  10.8505   76.2711                     1     0
3 2020-02-01             Kerala  10.8505   76.2711                     2     0
4 2020-02-02             Kerala  10.8505   76.2711                     3     0
5 2020-02-03             Kerala  10.8505   76.2711                     3     0
6 2020-02-04             Kerala  10.8505   76.2711                     3     0
  Cured.Discharged.Migrated New.cases New.deaths New.recovered
1                         0         0          0             0
2                         0         0          0             0
3                         0         1          0             0
4                         0         1          0             0
5                         0         0          0             0
6                         0         0          0             0

The summary of the data

     Date           Name.of.State...UT    Latitude       Longitude    
 Length:4692        Length:4692        Min.   : 0.00   Min.   : 0.00  
 Class :character   Class :character   1st Qu.:18.11   1st Qu.:76.27  
 Mode  :character   Mode  :character   Median :23.94   Median :79.02  
                                       Mean   :23.19   Mean   :81.45  
                                       3rd Qu.:28.22   3rd Qu.:85.31  
                                       Max.   :34.30   Max.   :94.73  
 Total.Confirmed.cases    Death           Cured.Discharged.Migrated
 Min.   :     1        Length:4692        Min.   :     0.0         
 1st Qu.:    39        Class :character   1st Qu.:     9.0         
 Median :   619        Mode  :character   Median :   197.5         
 Mean   : 11394                           Mean   :  6908.1         
 3rd Qu.:  5233                           3rd Qu.:  2736.0         
 Max.   :468265                           Max.   :305521.0         
   New.cases         New.deaths New.recovered    
 Min.   :    0.0   Min.   :0    Min.   :   -1.0  
 1st Qu.:    1.0   1st Qu.:0    1st Qu.:    0.0  
 Median :   26.0   Median :0    Median :    8.0  
 Mean   :  418.6   Mean   :0    Mean   :  283.1  
 3rd Qu.:  210.2   3rd Qu.:0    3rd Qu.:  119.0  
 Max.   :18366.0   Max.   :0    Max.   :13401.0  

The Structure of the data.

'data.frame':   4692 obs. of  10 variables:
 $ Date                     : chr  "2020-01-30" "2020-01-31" "2020-02-01" "2020-02-02" ...
 $ Name.of.State...UT       : chr  "Kerala" "Kerala" "Kerala" "Kerala" ...
 $ Latitude                 : num  10.9 10.9 10.9 10.9 10.9 ...
 $ Longitude                : num  76.3 76.3 76.3 76.3 76.3 ...
 $ Total.Confirmed.cases    : num  1 1 2 3 3 3 3 3 3 3 ...
 $ Death                    : chr  "0" "0" "0" "0" ...
 $ Cured.Discharged.Migrated: num  0 0 0 0 0 0 0 0 0 0 ...
 $ New.cases                : int  0 0 1 1 0 0 0 0 0 0 ...
 $ New.deaths               : int  0 0 0 0 0 0 0 0 0 0 ...
 $ New.recovered            : int  0 0 0 0 0 0 0 0 0 0 ...

As we can see the Death column type is character, so we need to change it to numeric in order to perform data visualization.

'data.frame':   4692 obs. of  10 variables:
 $ Date                     : chr  "2020-01-30" "2020-01-31" "2020-02-01" "2020-02-02" ...
 $ Name.of.State...UT       : chr  "Kerala" "Kerala" "Kerala" "Kerala" ...
 $ Latitude                 : num  10.9 10.9 10.9 10.9 10.9 ...
 $ Longitude                : num  76.3 76.3 76.3 76.3 76.3 ...
 $ Total.Confirmed.cases    : num  1 1 2 3 3 3 3 3 3 3 ...
 $ Death                    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Cured.Discharged.Migrated: num  0 0 0 0 0 0 0 0 0 0 ...
 $ New.cases                : int  0 0 1 1 0 0 0 0 0 0 ...
 $ New.deaths               : int  0 0 0 0 0 0 0 0 0 0 ...
 $ New.recovered            : int  0 0 0 0 0 0 0 0 0 0 ...

Data Cleaning

we will look for the missing values in the whole dataset. We will perform data cleaning step for every subset of the dataset we take to answer a question.

                     Date        Name.of.State...UT                  Latitude 
                        0                         0                         0 
                Longitude     Total.Confirmed.cases                     Death 
                        0                         0                         1 
Cured.Discharged.Migrated                 New.cases                New.deaths 
                        0                         0                         0 
            New.recovered 
                        0 

The Death column has null value of one row. And we fill that with 0.

Here we are creating new variable df2 by using the mutate function to Converting the Date column into three separate columns(Day/Month/Year).We are doing this to check the average of Death and New cases column by month.

        Date Name.of.State...UT Latitude Longitude Total.Confirmed.cases Death
1 2020-01-30             Kerala  10.8505   76.2711                     1     0
2 2020-01-31             Kerala  10.8505   76.2711                     1     0
3 2020-02-01             Kerala  10.8505   76.2711                     2     0
4 2020-02-02             Kerala  10.8505   76.2711                     3     0
5 2020-02-03             Kerala  10.8505   76.2711                     3     0
6 2020-02-04             Kerala  10.8505   76.2711                     3     0
  Cured.Discharged.Migrated New.cases New.deaths New.recovered  day month year
1                         0         0          0             0 2020     1   30
2                         0         0          0             0 2020     1   31
3                         0         1          0             0 2020     2    1
4                         0         1          0             0 2020     2    2
5                         0         0          0             0 2020     2    3
6                         0         0          0             0 2020     2    4

Drop the Date column from df2

  Name.of.State...UT Latitude Longitude Total.Confirmed.cases Death
1             Kerala  10.8505   76.2711                     1     0
2             Kerala  10.8505   76.2711                     1     0
3             Kerala  10.8505   76.2711                     2     0
4             Kerala  10.8505   76.2711                     3     0
5             Kerala  10.8505   76.2711                     3     0
6             Kerala  10.8505   76.2711                     3     0
  Cured.Discharged.Migrated New.cases New.deaths New.recovered  day month year
1                         0         0          0             0 2020     1   30
2                         0         0          0             0 2020     1   31
3                         0         1          0             0 2020     2    1
4                         0         1          0             0 2020     2    2
5                         0         0          0             0 2020     2    3
6                         0         0          0             0 2020     2    4

Questions

Q1: The average number of new cases and deaths over each month.

Q2: which top 10 states has the average number of new cases and deaths.

Q1: The average number of new cases and deaths in each month.

In order to see the number New Cases and deaths per month we have split the DATE column into year, month and day. To group by month and take the mean of the observations.

# A tibble: 8 x 3
  month New.cases    Death
  <dbl>     <dbl>    <dbl>
1     1    0         0    
2     2    0.0690    0    
3     3    2.63      0.408
4     4   33.8      14.0  
5     5  139.       87.0  
6     6  377.      300.   
7     7 1110.      748.   
8     8 1551.     1102.   

Descriptive statistics for the New cases and deaths for Q1
month New.cases Death
1 0.0000000 0.0000000
2 0.0689655 0.0000000
3 2.6323232 0.4080808
4 33.8271078 13.9658485
5 139.1443798 87.0087209
6 377.3152709 299.9546798
7 1110.1172840 748.3755144
8 1550.7904762 1102.1047619

Q2: which top 10 states has the average number of new cases and deaths.

To see the mean New Cases and deaths of top 10 state. We group by state and take the mean the observations.

# A tibble: 10 x 3
   Name.of.State...UT New.cases Death
   <chr>                  <dbl> <dbl>
 1 Delhi                   911. 1112.
 2 Gujarat                 490. 1013.
 3 Karnataka              1030.  348.
 4 Madhya Pradesh          265.  351.
 5 Maharashtra            3185. 3998.
 6 Tamil Nadu             1835.  750.
 7 Telangana              1348.  344.
 8 Telangana***              0   455 
 9 Uttar Pradesh           687.  375.
10 West Bengal             655.  398.

Descriptive statistics for the top 10 state of New cases and deaths for Q2
Name.of.State…UT New.cases Death
Delhi 910.5909 1111.5390
Gujarat 490.1985 1013.1618
Karnataka 1030.2585 348.4422
Madhya Pradesh 264.6667 351.4148
Maharashtra 3185.4898 3997.6054
Tamil Nadu 1835.2953 750.1007
Telangana 1347.6471 343.8824
Telangana*** 0.0000 455.0000
Uttar Pradesh 686.7237 374.7303
West Bengal 654.6797 398.0703

Ola Lanlehin Findings

The Death column has null value of one row. And we fill that with 0.

The dataset is the downloaded from Kaggle. It is the Covid dataset the states in India with variables that cover those tested positive to the virus, denoted as confirmed cases, deaths, Dates virus detected, the number recoveries, geographic locations denoted longitude and latitude and more. It gives a rich historical record the document to analyse and draw insight from.

The graph below summarises the figure of covid cases per state. The insight from the graph below is that the spread of Covid-19, across India,is widespread and the number case are not far between except for two or three states.

The graph below attempts to investigate the pattern in the implied distribution of Total Corfirmed Cases as it relates to those that have survived and recovered from the virus. The implication of the relationship is that the recoveries seems to be suggests that as covi cases rise, recoveries drop. The likely explanation is that new variants slow recoveries.

The graphy below shows covid cases and the outliers. Some states regular or arithmetic increases while other showing geometric in denoted by the number of outliers they posted above the box plots. The sudden jump in number maybe explained by population density,

Kishan Findings

State wise Total Cases

The bar graph of total cases in indian states and according number of cases how many death and cured number of cases shown below graph. As shown in graph Maharashtra had over 10million number of cases between 30-01-2020 to 06-08-2020 which is highest number of cases overall to compare to other states and also active case, cured cases and death cases is around 6.458M, 8.146M and 587.648k number of cases which is more than to compare different stats.A second highest is Tamilnadu for cases of active,death and cured. Union Territory of chandigarh has only 2 active cases and 0 deaths and cured cases. As following data also find 0 death and 0 cured cases in Union Territory of ladakh and Union territory of jammu and kashmir.

A rate of total cases in India base on different type of cases (Death and Cured cases)

As below graph is showing death rate and recovery rate by states wise and comparison of each other for confirmed cases, cured rate and death rate. In this scenario Gujarat is most death rate compare to other states which is 5% death rate. Telangana** has highest cured rate of cases which is 76.9%. Mizoram and Puducherry has 0% death rate total out of 13335 and 82967 confirmed cases. Meghalaya is one of the states which has minimum cured rate it is 28.1%. As to comparison clearly show which states has how much rate of death, cured and active cases.

Date wise cases differ and changes in numbers of cases

A scatter plot describe date wise how many number of cases active, cured and death. As per dataset 23-05-2020 date has 1.378M total number of cases and after that cured cases is surpass of active cases. Total number of death at 23-05-2020 is 67.317k. At the end of 06-08-2020 total cured cases are around 32.413M out of all confirmed cases. As clear plot describe the simply way date wise growing cured cases of corona virus in India.